ValWorkBench: An open source Java library for cluster validation, with applications to microarray data analysis
نویسندگان
چکیده
The prediction of the number of clusters in a dataset, in particular microarrays, is a fundamental task in biological data analysis, usually performed via validation measures. Unfortunately, it has received very little attention and in fact there is a growing need for software tools/libraries dedicated to it. Here we present ValWorkBench, a software library consisting of eleven well known validation measures, together with novel heuristic approximations for some of them. The main objective of this paper is to provide the interested researcher with the full software documentation of an open source cluster validation platform having the main features of being easily extendible in a homogeneous way and of offering software components that can be readily re-used. Consequently, the focus of the presentation is on the architecture of the library, since it provides an essential map that can be used to access the full software documentation, which is available at the supplementary material website [1]. The mentioned main features of ValWorkBench are also discussed and exemplified, with emphasis on software abstraction design and re-usability. A comparison with existing cluster validation software libraries, mainly in terms of the mentioned features, is also offered. It suggests that ValWorkBench is a much needed contribution to the microarray software development/algorithm engineering community. For completeness, it is important to mention that previous accurate algorithmic experimental analysis of the relative merits of each of the implemented measures [19,23,25], carried out specifically on microarray data, gives useful insights on the effectiveness of ValWorkBench for cluster validation to researchers in the microarray community interested in its use for the mentioned task.
منابع مشابه
JaTeCS an open-source JAva TExt Categorization System
JaTeCS is an open source Java library that supports research on automatic text categorization and other related problems, such as ordinal regression and quantification, which are of special interest in opinion mining applications. It covers all the steps of an experimental activity, from reading the corpus to the evaluation of the experimental results. As JaTeCS is focused on text as the main i...
متن کاملSoftware support for SBGN maps: SBGN-ML and LibSBGN
MOTIVATION LibSBGN is a software library for reading, writing and manipulating Systems Biology Graphical Notation (SBGN) maps stored using the recently developed SBGN-ML file format. The library (available in C++ and Java) makes it easy for developers to add SBGN support to their tools, whereas the file format facilitates the exchange of maps between compatible software applications. The librar...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملpobj: A Lightweight Persistent Objects Library and Its Application to Persistency in Titanium/Java
Persistent objects are useful for applications that require data structures to be maintained across multiple executions. This paper describes pobj, a lightweight facility for providing persistent objects. The library offloads the actual backing store management to lladd, an open-source implementation of the ARIES recovery algorithm, and memory management to external libraries. This layered appr...
متن کاملLibSBML: an API Library for SBML
UNLABELLED LibSBML is an application programming interface library for reading, writing, manipulating and validating content expressed in the Systems Biology Markup Language (SBML) format. It is written in ISO C and C++, provides language bindings for Common Lisp, Java, Python, Perl, MATLAB and Octave, and includes many features that facilitate adoption and use of both SBML and the library. Dev...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computer methods and programs in biomedicine
دوره 118 2 شماره
صفحات -
تاریخ انتشار 2015